-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype references #6433
base: main
Are you sure you want to change the base?
Prototype references #6433
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Just a few comments. Feel free to ignore if it's too early.
if mixup_or_cutmix: | ||
batch_transform = transforms.Compose( | ||
[ | ||
WrapIntoFeatures(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we have to WrapIntoFeatures unconditionally from whether we use mixup/cutmix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that default_collate
does not respect tensor subclasses. Since we use this transform afterwards, we need to wrap here. Of course we can also wrap before, but it is not necessary since the input is a plain tensor that defaults to images and an integer which is completely ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question is, what if I don't use mixup or cutmix? Shouldn't we wrap the data into features.Image
anyway? I might be missing something here. My main point is that since we are testing the new API, we should probably wrap all inputs using their appropriate types and see how the new kernels behave (Rather than relying on their default/legacy pure tensor implementations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can only wrap after we have converted from PIL. This happens fairly late in the transform:
transforms.PILToTensor(), |
I remember @vfdev-5 noting that on the CPU PIL kernels are faster (I don't remember if there was a special case or other constraints; please fill the blanks). Thus, if we want to optimize for speed, we should probably leave it as is. No strong opinion though.
I've added the needed changes for the detection references. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pmeier There seem to be a few issues with the scripts. See below. Might be worth doing a dummy run with very little data to confirm they work.
I've run the references for a few iterations with the following parameters to confirm they work:
|
Tensor Backend + antialias=TrueReverifying the new API after the speed optimizations. Reference runs at #6433 (comment) and #6433 (comment) ClassificationAugmentation: ta_wide + random erasing + mixup + cutmixUsing githash 959af2d:
Result: Similar accuracy, 11% faster than unoptimized V2. Augmentation: aa + random erasingUsing githash 959af2d:
Result: Similar accuracy (improvement not statistically significant), 13% faster than unoptimized V2. DetectionAugmentation: multiscaleUsing githash 959af2d:
Result: Similar accuracy and speed. Augmentation: ssdliteUsing githash 959af2d:
Result: Similar accuracy, 8% faster than unoptimized V2. Augmentation: ssdUsing githash 959af2d:
Result: Similar accuracy, 4% faster than unoptimized V2. Augmentation: lsj + copypasteUsing githash 959af2d:
Result: Similar accuracy and speed. SegmentationUsing githash 959af2d:
Result: Similar accuracy and speed. VideoNew RecipeUsing githash 959af2d:
Result: Similar accuracy, the speed is faster. It's a bit harder to estimate improvement because the logs indicate IO slowdown caused by OnTap during at least 3 epochs. The new version is consistently 6-7 minutes per epoch faster than the old, which translates to roughly 7-8% improvement. |
…ences/classification
I don't want to merge this PR. This more like a feature branch that we can discuss on. For the actual port we can either cleanup this PR or use it as a starting point for another.